Skip to content

workload-replay: require the parser by default for query literals#36799

Closed
jasonhernandez wants to merge 2 commits into
workload-anonymize-subsource-keysfrom
workload-anonymize-require-parser
Closed

workload-replay: require the parser by default for query literals#36799
jasonhernandez wants to merge 2 commits into
workload-anonymize-subsource-keysfrom
workload-anonymize-require-parser

Conversation

@jasonhernandez
Copy link
Copy Markdown
Contributor

Fifth in the stack — base workload-anonymize-subsource-keys (#36797).

Change

Flips the anonymizer to fail-closed on literal redaction. Previously, if the mz-sql-anonymize binary was missing (or an individual captured statement didn't parse), the tool silently fell back to a regex that only redacts single-quoted strings — leaving numbers, dollar-quoted strings, and comments in query SQL exposed, with just a warning. For a privacy tool, fail-open is the wrong default.

New --require-parser flag, on by default:

  • Parser binary unavailable → error, with build instructions, instead of regex fallback.
  • Parser present but some captured queries don't parse → error, reporting how many.
  • --no-require-parser opts back into the regex fallback (which now warns it's active).

Also moved output-target resolution to the top of main() so an invalid invocation (no -o, no --in-place) fails immediately, independent of the parser check.

Behavior summary

Situation Before After (default) --no-require-parser
Binary missing regex + warning error regex + warning
Statement doesn't parse regex for that stmt error regex for that stmt
Binary present, all parse parser parser parser

Testing

  • 20 tests pass. New test_require_parser_errors_without_binary covers the fail-closed default; the existing regex-path tests now pass --no-require-parser via the run_tool helper (which defaults to it so tests are deterministic whether or not the binary is built).
  • bin/fmt, ruff clean.
  • README updated with the new default and flag.

🤖 Generated with Claude Code

jasonhernandez and others added 2 commits May 29, 2026 13:16
Previously, if the mz-sql-anonymize binary was missing or a statement
failed to parse, the anonymizer silently fell back to a regex that only
redacts single-quoted strings — leaving numbers, dollar-quoted strings, and
comments in query SQL exposed. For a privacy tool, fail-open is the wrong
default.

Add --require-parser (default on). When set, the tool errors rather than
emit weaker output if the parser binary is unavailable or any captured query
does not parse. --no-require-parser opts back into the regex fallback, which
now warns that it is in use.

Also resolve the output target up front so an invalid invocation fails
before any work, independent of the new parser check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-statement parse failures are a property of the captured SQL, not of
whether the parser is present, so they fall back to the regex with a warning
in both modes (the verify pass still scans them). --require-parser now errors
only when the parser binary itself is unavailable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jasonhernandez
Copy link
Copy Markdown
Contributor Author

Superseded by #36803, which squashes this stack into a single PR against main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant